library(tidyverse)
library(plotly)
Exercises subject to:
Copyright © Sept 27, 2024 by Dr. H. Bozdogan. All rights reserved
Exercise based on Gelman et al (1995) Bayesian Data Analysis page 84 (page 94 of pdf has page number 84)
Question text rendered like this
Answer text rendered like this
Normal approximations to the posterior distribution (see, Gelman et al. (1995). Bayesian Data Analysis, Chapman & Hall/CRC):
If the posterior distribution
\[ \pi (\theta |x)\propto L(x|\theta )\times \pi (\theta ) \tag{1} \]
is unimodal and roughly symmetric, then it is convenient to approximate it by a normal distribution centered around the mode. That, is, the logarithm of the posterior density function \(\log \pi (\theta > |x)\) is approximated by a quadratic function.
A Taylor series expansion of \(\log \pi (\theta |x)\) centered at the posterior mode, \(\widehat{\theta}\) (where \(\theta\) can be vector and \(\widehat{\theta}\) is assumed to be in the interior of the parameter space \(\Theta\)), gives
\[ \log \pi (\theta |x)=\log \pi (\widehat{\theta}|x)+\frac{1}{2}(\theta -\widehat{\theta})^{\prime }\left[ \frac{\partial ^{2}}{\partial \theta ^{2}}\log \pi (\theta |x)\right] _{\theta =\widehat{\theta}}(\theta -\widehat{\theta}) \tag{2} \]
with the linear term in the expansion is zero because the log-posterior density has zero derivative at its mode. The second term of (2), considered as a function of \(\theta\) is proportional to the logarithm of a normal density, which gives us the approximation
\[ \pi (\theta |x)\approx N(\widehat{\theta},\mathscr{F}_{Obs}^{-1}(\widehat{\theta})), \tag{3} \]
where \(\mathscr{F}_{Obs}\) is the observed Fisher information,
\[ \mathscr{F}_{Obs}(\theta )=-\frac{\partial ^{2}}{\partial \theta ^{2}}\log \pi (\theta |x). \tag{4} \]
If the mode, \(\widehat{\theta}\), is in the interior of the parameter space, then \(\mathscr{F}_{Obs}(\widehat{\theta})\) is positive; if \(\theta\) is a vector parameter, then \(\mathscr{F}_{Obs}(\widehat{\theta})\) is a matrix.
Let \(x_{1},\ldots ,x_{n}\) be independent and identically distributed (iid) observations from \(N(\mu ,\sigma ^{2})\) distribution. Assume that both the mean \(\mu\) and the variance \(\sigma ^{2}\) are unknown. For simplicity, consider a uniform prior density for \((\mu ,\log \sigma )\).
Then the log-posterior density is given by
\[ \log \pi (\mu ,\log \sigma |x)=constant-n\log \sigma -\frac{1}{2\sigma ^{2}}\left[ (n-1)s^{2}+n(\bar{x}-\mu )^{2}\right] . \tag{5} \]
- By taking the first partial derivatives of (5) with respect to \(\mu\) , and \(\log \sigma\), show that the posterior mode is
\[ (\widehat{\mu},\log \widehat{\sigma})=\left( \bar{x},\frac{1}{2}\log \left(\frac{n-1% }{n}s^{2}\right) \right) , \tag{6} \]
where \(\widehat{\mu}=\bar{x}\) and \(\log \widehat{\sigma}=\frac{1}{2}\log \left(\frac{n-1}{n}s^{2}\right) =\log \left( \frac{n-1}{n}s^{2}\right)^{1/2}\) so that
\[ \widehat{\sigma}=\left( \frac{n-1}{n}s^{2}\right) ^{1/2}. \tag{7} \]
Going to take partial derivative of (5) with respect to \(\mu\) , starting by distributing the \(-\frac{1}{2\sigma^2}\) and dropping terms that don’t concern \(\mu\):
\[ \frac{\partial}{\partial \mu} \left( -\frac{n}{2\sigma^{2}}(\bar{x}-\mu )^{2} \right) \]
Apply chain rule
\[ \frac{\partial}{\partial \mu} \left( -\frac{n}{2\sigma^{2}}(\bar{x}-\mu )^{2} \right) = -\frac{n}{2\sigma^{2}} \cdot 2 \cdot (\bar{x}-\mu ) \cdot -1 = \frac{n}{\sigma^2}(\bar{x} - \mu) \]
Setting equal to 0 and solving for \(\mu\) to find the mode
\[ \frac{n}{\sigma^2}(\bar{x} - \mu) = 0 \Rightarrow \bar{x} - \mu = 0 \Rightarrow \widehat{\mu} = \bar{x} \]
Going to take partial derivative of (5) with respect to \(\log \sigma\), starting by replacing \(\sigma^2\) with \(e^{2 \log \sigma}\) and dropping terms that don’t concern \(\log \sigma\).
\[ \frac{\partial}{\partial \log \sigma}\left(-n\log \sigma -\frac{1}{2 e^{2 \log \sigma}}\left[ (n-1)s^{2}+n(\bar{x}-\mu )^{2}\right]\right) \]
Differentiating we end up with the below statement (the trickiest piece here: \(\frac{\partial}{\partial \log \sigma} ( \frac{1}{e^{2 \log \sigma}}) = \frac{\partial}{\partial \log \sigma} ( e^{-2 \log \sigma}) = -2e^{-2 \log \sigma} \Rightarrow \frac{\partial}{\partial \log \sigma} ( \frac{1}{2e^{2 \log \sigma}}) = -\frac{1}{e^{2 \log \sigma}}\) ) results in
\[ -n + \frac{1}{ e^{2 \log \sigma}}\left[ (n-1)s^{2}+n(\bar{x}-\mu )^{2}\right] \]
Replacing back that \(\sigma^2 = e^{2 \log \sigma}\)
\[ -n + \frac{1}{ \sigma^{2}}\left[ (n-1)s^{2}+n(\bar{x}-\mu )^{2}\right] \]
To find mode with respect to \(\log \sigma\) setting equal to 0 and plugging in \(\hat{\mu} = \bar{x}\) :
\[ -n + \frac{1}{\sigma^2}\left[ (n-1)s^{2}+n(\bar{x} - \bar{x} )^{2}\right] = 0 \]
Simplify
\[ \frac{1}{\sigma^2} (n-1)s^{2} = n \]
Algebra
\[ \sigma^2 = \frac{n-1}{n} s^2 \]
More algebra
\[ \widehat{\sigma} = \sqrt{ \frac{n-1}{n} s^2} \Rightarrow \log \widehat{\sigma} = \log \left( \sqrt{ \frac{n-1}{n} s^2} \right) = \frac{1}{2} \log \left( \frac{n-1}{n} s^2 \right) \]
Generate \(n=25\), and \(50\) observations from \(N(0,1)\), and contour plot the log-posterior distribution and compute the value of the posterior mode.
log_posterior <- function(x, mu, sigma) {
n <- length(x)
xbar <- mean(x)
s <- sd(x)
log_sigma <- log(sigma)
-n * log_sigma - 1 / (2 * sigma^2) * ((n - 1) * s^2 + n * (xbar - mu)^2)
}
joint_posterior_grid <- function(x, grid_res = 64) {
n <- length(x)
posterior_mode_mu <- mean(x)
posterior_mode_sigma <- sqrt(var(x) * (n - 1) / n)
posterior_mode_log_sigma <- log(posterior_mode_sigma)
# 1.6 by 1.6 grid around true values
mu_range <- seq(from = -0.8, to = 0.8, length.out = grid_res)
sigma_range <- seq(from = 0.2, to = 1.8, length.out = grid_res)
# Centered around mode option
# offset <- 0.5
# mu_range <- seq(
# from = posterior_mode_mu - offset,
# to = posterior_mode_mu + offset,
# length.out = grid_res
# )
# sigma_range <- seq(
# from = posterior_mode_sigma - offset,
# to = posterior_mode_sigma + offset,
# length.out = grid_res
# )
contour_data <- expand.grid(mu = mu_range, sigma = sigma_range)
contour_data$log_likelihood <- vapply(1:nrow(contour_data), \(i) {
mu_i <- contour_data$mu[i]
sigma_i <- contour_data$sigma[i]
log_posterior(x = x, mu = mu_i, sigma = sigma_i)
}, numeric(1))
log_likelihood_matrix <- contour_data |>
pivot_wider(names_from = mu, values_from = log_likelihood) |>
select(-sigma) |>
as.matrix()
contour_data$mu_mode <- posterior_mode_mu
contour_data$sigma_mode <- posterior_mode_sigma
list(
tall = contour_data,
wide = log_likelihood_matrix,
mu_mode = posterior_mode_mu,
mu_range = mu_range,
sigma_mode = posterior_mode_sigma,
sigma_range = sigma_range
)
}
plot_3d_surface <- function(x, unlog = FALSE, grid_res = 64) {
likelihood <- joint_posterior_grid(x = x, grid_res = grid_res)
if (unlog) {
p <- plot_ly(
x = ~ likelihood$mu_range,
y = ~ likelihood$sigma_range,
z = ~ exp(likelihood$wide)
)
} else {
p <- plot_ly(
x = ~ likelihood$mu_range,
y = ~ log(likelihood$sigma_range),
z = ~ likelihood$wide
)
}
p |>
add_surface(
hovertemplate = ifelse(unlog,
"Mu: %{x}<br>Sigma: %{y}<br>Likelihood: %{z}<extra></extra>",
"Mu: %{x}<br>Sigma: %{y}<br>Log-Likelihood: %{z}<extra></extra>"
)
) |>
layout(
title = list(
text = sprintf(
"Mode mu^ = %.2f sigma^ = %.2f (log sigma^ = %.2f)",
likelihood$mu_mode,
likelihood$sigma_mode,
log(likelihood$sigma_mode)
)
),
scene = list(
xaxis = list(title = "Mu"),
yaxis = list(title = ifelse(unlog, "Sigma", "Log-Sigma")),
zaxis = list(title = ifelse(unlog, "Likelihood", "Log-Likelihood"), showticklabels = FALSE)
)
) |>
hide_colorbar() |>
config(displayModeBar = FALSE, scrollZoom = FALSE)
}
Comparing posteriors for \(n=25\) and \(n=50\).
set.seed(55)
data_samples <- lapply(1:5, \(i) {
x_50 <- rnorm(n = 50, mean = 0, sd = 1)
x_25 <- head(x_50, 25)
list(x_50 = x_50, x_25 = x_25, iter = i)
})
samples <- lapply(data_samples, \(sample) {
x_50 <- sample$x_50
x_25 <- sample$x_25
i <- sample$iter
ll_tall_50 <- joint_posterior_grid(x_50)$tall
ll_tall_25 <- joint_posterior_grid(x_25)$tall
ll_tall_50$iter <- i
ll_tall_25$iter <- i
ll_tall_50$n <- 50
ll_tall_25$n <- 25
scl <- 0.01
ll_tall_50$propto_likelihood <- exp(ll_tall_50$log_likelihood * scl - max(ll_tall_50$log_likelihood) * scl)
ll_tall_25$propto_likelihood <- exp(ll_tall_25$log_likelihood * scl - max(ll_tall_25$log_likelihood) * scl)
rbind(ll_tall_25, ll_tall_50)
})
posterior_likelihoods <- do.call(rbind, samples)
ggplot(posterior_likelihoods, aes(y = mu, x = sigma, z = propto_likelihood)) +
geom_contour(binwidth = 0.025) +
facet_grid(paste("n =", n) ~ paste("sample rep =", iter)) +
geom_hline(aes(yintercept = mu_mode), linetype = "dotted", color = "#444444") +
geom_vline(aes(xintercept = sigma_mode), linetype = "dotted", color = "#444444") +
coord_fixed(ratio = 1) +
labs(
title = "Posteriors for mu & sigma (5 reps N(0, 1) w/n=25 & n=50; dotted lines at modes)",
subtitle = "In each rep, the larger sample contains the smaller sample"
)
In general, as \(n\) increases we see the approximations improve and the variance of the estimates decrease.
Some interactive views of the contours.
A view of sample rep = 1 with n = 25 (this
relates to the sample in the top left of the plot grids)
x <- data_samples[[1]]$x_25
plot_3d_surface(x, unlog = TRUE)
A view of sample rep = 1 with n = 50 (this
relates to the sample in the bottom left of the plot grids)
x <- data_samples[[1]]$x_50
plot_3d_surface(x, unlog = TRUE)
A view of sample rep = 3 with n = 25 (this
relates to the sample in the top center of the plot grids)
x <- data_samples[[3]]$x_25
plot_3d_surface(x, unlog = TRUE)
A view of sample rep = 3 with n = 50 (this
relates to the sample in the top center of the plot grids)
x <- data_samples[[3]]$x_50
plot_3d_surface(x, unlog = TRUE)
These views further highlight the decrease in the variance of our estimates as \(n\) increases.
The visual of sample rep = 3 highlights the move to
better approximating the true values.
- Take the second partial derivatives of (5) to show that the posterior distribution using (3) can be approximated as
\[ \log \pi (\mu ,\log \sigma |x)=constant-n\log \sigma -\frac{1}{2\sigma ^{2}}\left[ (n-1)s^{2}+n(\bar{x}-\mu )^{2}\right] . \tag{5} \]
\[ \frac{\partial}{\partial \mu} = \frac{n}{\sigma^2}(\bar{x} - \mu) \]
\[ \frac{\partial}{\partial \log \sigma} = -n + \frac{1}{ e^{2 \log \sigma}}\left[ (n-1)s^{2}+n(\bar{x}-\mu )^{2}\right] \]
Finding \(\frac{\partial^2}{\partial \mu^2}\)
\[ \frac{\partial^2}{\partial \mu^2} = \frac{\partial}{\partial \mu} \left( \frac{n}{\sigma^2}(\bar{x} - \mu) \right) = -\frac{n}{\sigma^2} \]
Finding \(\frac{\partial^2}{\partial \log \sigma^2}\) (from earlier \(\frac{\partial}{\partial \log \sigma} ( \frac{1}{e^{2 \log \sigma}}) = -2 \frac{1}{e^{2 \log \sigma}}\) )
\[ \frac{\partial^2}{\partial \log \sigma^2} = \frac{\partial}{\partial \log \sigma} \left( -n + \frac{1}{ e^{2 \log \sigma}}\left[ (n-1)s^{2}+n(\bar{x}-\mu )^{2}\right] \right) = -2 \frac{1}{e^{2 \log \sigma}} \left[ (n-1)s^{2}+n(\bar{x}-\mu )^{2}\right] \]
Excluding the mixed partial derivatives as they’re not relevant to the result we’re trying to obtain.
According to equation \((4)\) ( \(\mathscr{F}_{Obs}(\theta )=-\frac{\partial ^{2}}{\partial \theta ^{2}}\log \pi (\theta |x)\) )
\[ \mathscr{F}_{Obs}(\mu) = \frac{n}{\sigma^2} \Rightarrow \mathscr{F}_{Obs}^{-1}(\mu) = \frac{\widehat{\sigma}^2}{n} \]
and
\[ \mathscr{F}_{Obs}(\log \sigma) = 2 \frac{1}{e^{2 \log \sigma}} \left[ (n-1)s^{2}+n(\bar{x}-\mu )^{2}\right] = 2 \frac{1}{\sigma^2} \left[ (n-1)s^{2}+n(\bar{x}-\mu )^{2}\right] \]
Plugging in the mode estimates of \(\hat{\sigma} = \sqrt{ \frac{n-1}{n} s^2}\) & \(\hat{\mu} = \bar{x}\):
\[ \mathscr{F}_{Obs}(\log \sigma) = 2 \frac{1}{\sqrt{\frac{n-1}{n} s^2}^2} \left[ (n-1)s^{2}+n(\bar{x}-\bar{x} )^{2}\right] \]
This simplifies to:
\[ \mathscr{F}_{Obs}(\log \sigma) = 2 \frac{n}{(n-1) s^2} (n-1)s^{2} = 2n \Rightarrow \mathscr{F}_{Obs}^{-1}(\log \sigma) = \frac{1}{2n} \]
So the full inverse fisher information is
\[ \mathscr{F}_{Obs} = \left[ \begin{array}{cc} \frac{n}{\widehat{\sigma}^{2}} & 0 \\ 0 & \frac{2n}{1} \end{array} \right] \]
And inverse fisher information matrix (aka covariance matrix, \(\Sigma\) ) is
\[ \mathscr{F}_{Obs}^{-1} = \left[ \begin{array}{cc} \frac{\widehat{\sigma}^{2}}{n} & 0 \\ 0 & \frac{1}{2n} \end{array} \right] = \Sigma \]
For a bivariate normal centered at our modes \(\widehat{\mu}\) and \(\log \widehat{\sigma}\), i.e. \(\pi (\mu ,\log \sigma |x) \sim N_2([\widehat{\mu}, \log \widehat{\sigma}], \Sigma)\) , or written out more completely in equation \((8)\).
\[ \pi (\mu ,\log \sigma |x) \sim N_2 \left( \left[ \begin{array}{c} \widehat{\mu} \\ \log \widehat{\sigma} \end{array} \right] , \left[ \begin{array}{cc} \frac{\widehat{\sigma}^{2}}{n} & 0 \\ 0 & \frac{1}{2n} \end{array} \right] \right). \tag{8} \]
Note that this is a bivariate normal distribution.
If we use the approximation in terms of \(\pi (\mu ,\sigma ^{2}|x)\), the second partial derivative matrix would be multiplied by the Jacobian of the transformation from \(\log \sigma\) to \(\sigma ^{2}\)and the mode would be changed slightly, to
\[ \tilde{\sigma}^{2}=\frac{n}{n+2}\widehat{\sigma}^{2}. \tag{9} \]
If we wanted to change results from terms of \(\log \sigma\) to terms of \(\sigma^2\) . We’d use the relationship of \(\sigma^2 = e^{2 \log \sigma} \Rightarrow \frac{\partial}{\partial \log \sigma} \sigma^2 = \frac{\partial}{\partial \log \sigma} e^{2 \log \sigma} = 2e^{2 \log \sigma} = 2 \sigma^2 = J(\log \sigma \to \sigma^2)\) therefore variance will be adjusted by \(J(\log \sigma \to \sigma^2)^2 = (2\sigma^2)^2 = 4 \sigma^4\) ; with mode adjustment: \(\frac{4n}{n+2}\tilde{\sigma}^{4}\)
So far, the best explanation I’ve come across for mode adjustment is page 327 from Bayesian Data Analysis by Gelman: “Conditional mode of \(\log \sigma\). The conditional posterior density for \(\sigma^2\) is scaled inverse-\(\chi^2\) and given by (11.10). The mode of the conditional posterior density of \(\log \sigma\) is obtained by replacing the current estimate of \(\log \sigma\) with \(\log \widehat{\sigma}\), with \(\widehat{\sigma}\) defined in (11.11). (From Appendix A, the conditional mode of \(\sigma^2\) is \(\frac{n}{n+2}\widehat{\sigma}^2\). The factor of \(\frac{n}{n+2}\) does not appear in the conditional mode of \(\log \sigma\) because of the Jacobian factor when transforming from \(p(\sigma^2)\) to \(p(\log \sigma)\); see Exercise 12.6.)” The relevant piece of “Appendix A” referenced is on page 579 at this link.
This leads to updated covariance matrix:
\[ \mathscr{F}_{Obs}^{-1} = \left[ \begin{array}{cc} \frac{\widehat{\sigma}^{2}}{n} & 0 \\ 0 & \frac{4n}{n+2}\tilde{\sigma}^{4} \times \frac{1}{2n} = \frac{2 \tilde{\sigma}^4}{n + 2} \end{array} \right] = \Sigma \]
The upper right term isn’t affected by this transformation so it’s left untouched.
By the independence of the two components \((\mu ,\sigma ^{2})\), the approximate posterior distribution of \(\sigma ^{2}\) becomes:
\[ \pi (\sigma ^{2}|x)\approx N\left( \sigma^{2}|\tilde{\sigma}^{2},\frac{2\tilde{\sigma}^{4}}{n+2}\right) . \tag{10} \]
- Plot the posterior distribution of \(\sigma ^{2}\) in (10) by generating \(n=25\), and \(50\) observations from \(N(0,1)\). Obtain the posterior variance:
\[ Var(\sigma ^{2})_{Post}=\frac{2\tilde{\sigma}^{4}}{n+2}. \tag{11} \]
samples <- lapply(data_samples, \(sample) {
x_50 <- sample$x_50
x_25 <- sample$x_25
i <- sample$iter
mu_posterior_mean_50 <- mean(x_50)
mu_posterior_mean_25 <- mean(x_25)
sd_posterior_mean_50 <- sd(x_50)
sd_posterior_mean_25 <- sd(x_25)
mu_posterior_var_50 <- sd(x_50)^2 / 50
mu_posterior_var_25 <- sd(x_50)^2 / 25
sd_posterior_var_50 <- 2 * sd(x_50)^2 / (50 + 2)
sd_posterior_var_25 <- 2 * sd(x_25)^2 / (25 + 2)
n <- 1e5
x_50 <- data.frame(
mu_x = rnorm(n, mu_posterior_mean_50, sqrt(mu_posterior_var_50)),
sd_x = rnorm(n, sd_posterior_mean_50, sqrt(sd_posterior_var_50)),
n = 50,
iter = i,
mu_mean = mu_posterior_mean_50,
sd_mean = sd_posterior_mean_50,
mu_var = mu_posterior_var_50,
sd_var = sd_posterior_var_50
)
x_25 <- data.frame(
mu_x = rnorm(n, mu_posterior_mean_25, sqrt(mu_posterior_var_25)),
sd_x = rnorm(n, sd_posterior_mean_25, sqrt(sd_posterior_var_25)),
n = 25,
iter = i,
mu_mean = mu_posterior_mean_25,
sd_mean = sd_posterior_mean_25,
mu_var = mu_posterior_var_25,
sd_var = sd_posterior_var_25
)
rbind(x_25, x_50)
})
posterior_samples <- do.call(rbind, samples)
ggplot(posterior_samples, aes(x = sd_x, color = sd_var)) +
geom_density() +
facet_grid(paste("n =", n) ~ paste("sample rep =", iter)) +
geom_vline(aes(xintercept = sd_mean), linetype = "dotted", color = "#444444") +
labs(
title = "Posteriors for sigma (5 reps N(0, 1) w/n=25 & n=50; dotted lines at modes)",
subtitle = "In each rep, the larger sample contains the smaller sample",
x = "",
y = "posterior density",
color = "posterior variance"
)
- Also plot the approximated posterior distribution in \((8)\) for the generated \(n=25\), and \(50\) observations from \(N(0,1)\).
Instead of directly using \((8)\), I’ve leveraged independence and re-used \((10)\) along with the below that comes from \((8)\)
\[ \pi (\mu | x) \sim N\left(\widehat{\mu}, \frac{\widehat{\sigma}^2}{n}\right) \]
ggplot(posterior_samples, aes(x = mu_x, color = mu_var)) +
geom_density() +
facet_grid(paste("n =", n) ~ paste("sample rep =", iter)) +
geom_vline(aes(xintercept = mu_mean), linetype = "dotted", color = "#444444") +
labs(
title = "Posteriors for mu (5 reps N(0, 1) w/n=25 & n=50; dotted lines at modes)",
subtitle = "In each rep, the larger sample contains the smaller sample",
x = "",
y = "posterior density",
color = "posterior variance"
)
mean_adj <- max(c(max(abs(posterior_samples$mu_mean)) * 1.5, 0.3))
sd_adj <- max(c(max(abs(1 - posterior_samples$sd_mean)) * 1.5, 0.3))
ggplot(posterior_samples, aes(y = mu_x, x = sd_x)) +
geom_density_2d_filled(show.legend = FALSE) +
facet_grid(paste("n =", n) ~ paste("sample rep =", iter)) +
geom_hline(aes(yintercept = mu_mean), linetype = "dotted", color = "#888888") +
geom_vline(aes(xintercept = sd_mean), linetype = "dotted", color = "#888888") +
lims(y = c(-mean_adj, mean_adj), x = c(1 - sd_adj, 1 + sd_adj)) +
coord_fixed(ratio = 1) +
labs(
title = "Posteriors for mu & sigma (5 reps N(0, 1) w/n=25 & n=50; dotted lines at modes)",
subtitle = "In each rep, the larger sample contains the smaller sample",
y = "mu", x = "sigma"
)
- Can you construct a 95% Posterior Deviance interval for \(\sigma ^{2}\)? Use your generated data for \(n=25\), and \(50\) observations from \(N(0,1)\).
Deviance is defined by
\[ D(\sigma ^{2})=2[\log L(\widehat{\sigma }^{2}) - \log L(\sigma ^{2})] \sim \chi _{1}^{2}(\alpha ). \tag{12} \]
Note: For 95% \(\chi _{1}^{2}(\alpha )=3.8415\).
sigma_deviance <- function(x, sigma_mle, sigma_candidate) {
mu <- mean(x)
like_1 <- log_posterior(x, mu = mu, sigma = sigma_mle)
like_2 <- log_posterior(x, mu = mu, sigma = sigma_candidate)
2 * (like_1 - like_2)
}
sigma_deviance_interval <- function(
x, posterior_mean, chi_thresh = 3.8415,
candidates_width = 1, candidates_step = 0.01) {
candidates <- seq(
from = posterior_mean - candidates_width / 2,
to = posterior_mean + candidates_width / 2,
by = candidates_step
)
lower_bound <- Inf
upper_bound <- -Inf
for (sigma_candidate in candidates) {
dev <- sigma_deviance(
x = x,
sigma_mle = posterior_mean,
sigma_candidate = sigma_candidate
)
if (abs(dev) <= chi_thresh) {
if (sigma_candidate < lower_bound) {
lower_bound <- sigma_candidate
}
if (sigma_candidate > upper_bound) {
upper_bound <- sigma_candidate
}
}
}
di <- c(lower_bound, upper_bound)
names(di) <- c("lower", "upper")
return(di)
}
interval_reps <- lapply(data_samples, \(sample) {
x_50 <- sample$x_50
x_25 <- sample$x_25
i <- sample$iter
sd_posterior_mean_50 <- sd(x_50)
sd_posterior_var_50 <- 2 * sd(x_50)^2 / (50 + 2)
sd_posterior_sd_50 <- sqrt(sd_posterior_var_50)
sd_posterior_mean_25 <- sd(x_25)
sd_posterior_var_25 <- 2 * sd(x_25)^2 / (50 + 2)
sd_posterior_sd_25 <- sqrt(sd_posterior_var_25)
dev_95_int_50 <- sigma_deviance_interval(x_50, posterior_mean = sd_posterior_mean_50)
posterior_95_auc_50 <- c(
qnorm(0.025, sd_posterior_mean_50, sd_posterior_sd_50),
qnorm(0.975, sd_posterior_mean_50, sd_posterior_sd_50)
)
dev_95_int_25 <- sigma_deviance_interval(x_25, posterior_mean = sd_posterior_mean_25)
posterior_95_auc_25 <- c(
qnorm(0.025, sd_posterior_mean_25, sd_posterior_sd_25),
qnorm(0.975, sd_posterior_mean_25, sd_posterior_sd_25)
)
intervals <- data.frame(rbind(
dev_95_int_25,
posterior_95_auc_25,
dev_95_int_50,
posterior_95_auc_50
))
names(intervals) <- c("lower", "upper")
intervals$n <- factor(paste("n =", c(25, 25, 50, 50)))
intervals$iter <- factor(paste("sample rep =", i))
intervals$interval <- c("95% Deviance", "95% Posterior HDI", "95% Deviance", "95% Posterior HDI")
intervals |>
select(iter, n, interval, lower, upper)
})
interval_df <- do.call(rbind, interval_reps) |>
arrange(iter, n)
rownames(interval_df) <- NULL
offset <- 1.05 * max(abs(1 - c(interval_df$lower, interval_df$upper)))
ggplot(data = interval_df, aes(y = interval, color = interval)) +
geom_errorbar(aes(xmin = lower, xmax = upper), width = 0.2) +
geom_point(aes(x = (lower + upper) / 2)) +
facet_grid(n ~ iter) +
xlim(c(1 - offset, 1 + offset)) +
scale_x_continuous(breaks = c(0.6, 1.0, 1.4)) +
labs(
title = "95% Deviance and HDI (5 reps N(0, 1) w/n=25 & n=50; dotted lines at modes) ",
subtitle = "In each rep, the larger sample contains the smaller sample",
y = "",
x = "sigma",
color = "Interval"
) +
theme(
axis.ticks.y = element_blank(), # Removes y-axis ticks
axis.text.y = element_blank() # Removes y-axis text
)
interval_df |>
pivot_wider(
names_from = interval,
values_from = c(lower, upper),
names_glue = "{interval} {tools::toTitleCase(.value)}"
) |>
select(1, 2, 3, 5, 4, 6)